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1. INTRODUCTION 

In the last decade, the use of mobile phones and personal digital assistant (PDA) devices increased 
tremendously. The growth of mobile phones has increased over the period of time with tremendous 
improvement at technology side. Be it network growth from 2G to 5G and handset evolution from feature 
phone to powerful smartphones. As per the report from GSMA [1], there are 5.2 billion subscribers globally. 
However, the growth of mobile phones has also increased the mobile related thefts and fraud rates with 183 
smartphones are stolen everyday between March 2015 and March 2016 in UK itself [2]. Mobile phones become 
potential targets as most of the transactions nowadays take place through them from the management of bank 
accounts to the buying and selling of stocks. This raises a potential question regarding the security of the mobile 
phone [3], [4]. Nowadays, different authentication approaches have been used on handheld devices to ensure 
the security of content [5], [6]. Some of them are password, fingerprint, iris, face, and pattern. These approaches 
are now the mainstream authentication techniques used across all the handheld and portable devices. The 
demerit of the existing approaches is that they are prone to shoulder surfing, guessing attacks, brute force 
attacks and dictionary attacks. Shoulder surfing is a password attack in which the user’s password is 
compromised by peeping in the password entry screen while the actual user types in the password [7]. 


Journal homepage: http://ijece.iaescore.com 


Int J Elec & Comp Eng ISSN: 2088-8708 o 803 


Relatively newer authentication method, that is pattern-based authentication in touchscreen devices, is also 
prone to the finger marks and smudges which can be used to lift the pattern sequence. As per recent report from 
counter point research, there is going to be a huge demand for the feature phone market as mentioned in [8]. 
Globally, the feature phone segment is forecast to generate around US $16 billion cumulatively in wholesale 
hardware revenues over the next three years. There affordability of feature phones is one of the major reasons 
why feature phones are the preferred mobile phone in many developing countries like India, Africa and 
Bangladesh. The market reports from counterpoint and shipment opportunity for feature phones there is a 
strong need to have robust security system [9], [10] without any additional hardware cost. In our proposed 
work, we have developed one of authentication solution based on behavioral keystroke dynamics from the 
user’s learned machine learning model for feature phones. 

To address the security issues in the feature phones, new age authentication technique uses biometrics 
as a means to identify the actual user. Biometric authentication [11] recognizes individuals by depending on 
their behavioral or physiological features like; fingerprints, iris, voice and signature. To deploy the fingerprint 
scanner or iris to the feature phone, it requires additional cost, memory and high computing power in the device 
which tend to increase the cost of the phone, therefore, such system is not feasible option in feature phone. 
Also, the conventional approaches are one-time authentication methods, in which a user can access the sensitive 
content if he or she logs the system with the correct password. Post acceptance of the password, there is no 
commercially available authentication system to ensure the continuity of security for that sensitive content in 
feature phones based on behavioral biometrics. Once the user is approved, there is no further screening of user 
which can help to detect if imposter is operating the device. Face, fingerprints and iris are examples of biometric 
solutions available in smartphones to offer high-end secure access control. In order to provide such biometric 
systems, it requires additional cost to mobile device because of additional equipment and hardware costs like 
fingerprint scanner, iris sensors requirement. Also, fingerprint and iris require high memory and computing 
power to perform and execute. Moreover, in case of feature phones which are low cost there is no support of 
biometric based authentication. To overcome this issue efficiently keystroke biometrics has been utilized in 
literature. Keystroke biometric authentication is behavioral based and it uses user’s key input patterns and it 
based on the fact that each individual user’s typing patterns are unique and consistent. 

Many approaches to authenticate a device by keystroke biometrics have come in foreplay. Clarke and 
Furnell [12] analyzed user authentication based on keystroke input patterns on handset devices. They have 
utilized the key input of 11-digit phone numbers and 4-digit password to classify individuals. Their models 
were based on the generalized regression networks with accuracy of equal error rate (EER) ranging from 9% 
to 16%. Hwang et al. [13] achieved EER of 13% when applying the arthematics rhythms with Cues. They have 
utilized the key input 4-digit password to classify the users. Their models employed framework where only 
valid user’s patterns are used for training purpose. In their work 25 users participated and they collected only 
5 patterns from each user for enrollment. 

Motwani et al. [14] in their work, the dataset was dynamically generated and the impostors were not 
involved during the enrollment phase, false rejection rate (FRR) was 3.2% with only 27 features. Stanciu et al. 
[15] focused on effectiveness of sensor-enhanced keystroke dynamics, they have utilized movement sensors 
that is accelerometer and gyroscope in their work. In their work 20 users participated and they gathered 
keystroke and sensor samples in controlled environment on Samsung Nexus S device. In their work results 
suggest that basic keyboard authentication is prone to attacks and when sensors are considered they obtained 
better results against statistical attacks. Huang et al. [16] achieved EER of 7.5% with their work for smartphone 
device on Android platform. They utilized statistical classification technique in their model. In their work client 
side, they developed Android application to capture the keystroke data, and server-side system database and 
authentication engine was developed as web service. The total of 40 users aged from 22 to 55 years old 
participated in their experiments. 

It has been observed that existing security authentication mechanism in feature phones are based on 
personal identification number (PIN) or password characters. Current security authentication provided in 
feature phones are prone to the security attacks from imposters and fraudulent attackers. The basic concept 
with keystroke dynamics is the capacity of the method to understand the patterns like typing pattern during 
keyboard usage from the individual and then use this as a parameter to verify the user. A person typing on the 
mobile device will have the check if the time difference between typing of each of the letters of the password 
is similar to the owner’s typing pattern. In the proposed work, the typing pattern (keystroke modality) of the 
user is learned with the k-nearest neighbors (k-NN) and fuzzy logic. The experimental data was collected on 
Samsung On7 Pro C3590 and the model was trained on the desktop PC Windows 10, by dividing the data into 
training and validation set. The next part of the paper is organized as follows section 2 explains our work 
keystroke dynamics-based authentication, section 3 explains results and discussion and future work and 
conclusion are discussed in section 4. 
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2. KEYSTROKE DYNAMICS APPROACH FOR FEATURE PHONES 

The behavioral biometric [17], [18] technology proposed in the paper by analyzing the typing pattern 
of the user which is also known as Keystroke dynamics. Behavioral biometric is the field of study that uniquely 
measures patterns of human activities and thereby identifies the user. Behavioral biometric authentication 
methods include Keystroke dynamics, Touch dynamics, Voice, Signatures and Gait. Figure 1 shows the various 
known behavioral biometric methods. Behavioral biometrics provide secure authentication for banking and 
insurance applications, retail point of sales, and various other domain that need continuous authentication based 
on user’s interaction with system. The block diagram of the proposed keystroke dynamics authentication 
system is shown in Figure 2. The whole process of keystroke dynamics is divided in to the following three 
steps: i) enrolment, ii) model training, and iii) authentication. 
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Figure 2. Basic steps for the authentication process 


2.1. Enrollment (data collection) 

In proposed work, a total of 25 users aged between 22 to 42 years, have participated in the experiment. 
In order to capture the keystroke data input from the users, we have developed mobile application on Samsung 
On7 Pro C3590. In our experiments, 4-digit password “1976” was used and users were asked to enter the 
password 60-times during enrollment phase at Samsung India Noida R&D center, where one of the author is 
working. The data collection was done in two separate sessions for each user. Entire enrollment process took 
one week to collect the sample data from all the users. The keystroke data acquisition step comprises of building 
a character transition lists for particular chosen keyword. The duration of key-presses between every two 
characters are stored. This proposed work was designed to classify the users in feature phone based on the 
typing patterns while entering the 4-digit personal identification number (PIN) key, which is based on the hold- 
time of key press, flight time and the total time entering the PIN. Our work utilizes the following mentioned 
parameters while capturing the data from the user: i) keystroke latency (flight time): time taken between two 
consecutive keystrokes, ii) hold-time: time to press and release a key, and iii) total time: time to press first key 
press and last key release. The components of features that are utilized in this work are shown in Figure 3. 


Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 802-815 


Int J Elec & Comp Eng ISSN: 2088-8708 0 805 


Total Time (t) 


Flight Time(i) 
eo m e e 
Key Hold Key Hold Key Hold 
Time(d) Time(d) Time(d) 
-=--> lentil ae et = = = = 


Figure 3. Keystroke dynamics features 


The input captured during enrolment is in the following format for the keyword “1976” with 8 features 
{i1, i2, i3, di, do, d3, d4, t]. Where: 
dx : Time of press for key (hold time in milliseconds) 
ix : Time between first key release and next key press (flight time in milliseconds) 
t : Total time from first key press to last key release (in milliseconds) 


2.2. Model training 

In proposed work, with the help of data collected during enrollment phase typing pattern for a 
particular password is recorded and then model is trained with k-NN and fuzzy logic. Overall, 8 features are 
collected as shown in Figure 3 including hold time, flight time and total time from first key to last key release 
for the keyword “1976”. The input features obtained are then passed through the k-NN model and the fuzzy 
training model separately and both the models are then trained using the given input features. The 
authentication values are obtained separately from both the models and final authentication value is returned 
as shown in Figure 4. 
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Figure 4. Proposed keystroke model architecture based on k-NN and fuzzy 


2.2.1. K-NN model 

We have used k-nearest neighbor (k-NN) as a classification method because of its key application in 
pattern recognition and prediction. k-NN uses the k nearest neighbors of the feature value, in our case we use 
5-NN to search five nearest neighbors of user’s typing patterns. In this algorithm, we make use of Euclidean 
distance to know the distance between the claimed user typing pattern and the actual user features. 

Euclidean distance provides the direct distance between two points in space along the line joining the 
two points. It helps to find the shortest distance between two input feature vectors along the line segment 
joining the two input vectors. The Euclidean distance is calculated for all the points and then nearest points 
having the smallest distance are considered. Here is pictorial representation as shown in Figure 5 of how 
Euclidean distance is used in k-NN. 


d(z,z') = Vv (zı — z) + (z2 = Z’) Fet (Zy Z'a) (1) 


Where: 
Z : Training sample value 
Zz : Test sample value 


d(z,z) : Computes the Euclidean distance 
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In nearest neighbor model, the number of characters in keyword decide the number of dimensions in 
a hyper dimensional space. Each set of typing pattern of that keyword among multiple iterations is one point 
in that hyper dimensional space. Since we have collected multiple inputs to train the model, we have multiple 
clouded points plotted for a particular keyword once the classifier training is complete as shown in Figure 6. 
In the proposed work we have hold time, flight time and total time as feature set which is captured from user 
and model is trained for the 4-digit keyword “1976”. Refer Figure 7 shows the model is trained with data set 
which is comprised of training samples and genuine or accepted and the sample cases which are imposter. 
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Figure 5. k-NN using Euclidean distance Figure 6. Plot of user data captured keystroke 
inputs 
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Figure 7. Plot of training data set of a user and the accepted and the rejected data 


2.2.2. Fuzzy logic 

Fuzzy means something that is not clearly defined or has crisp values. Fuzzy logic is used in the cases 
where a crisp definition cannot be provided for a quantity, such as the amount of hot or cold. In our case, a user 
when typing can have variable typing speeds, classified as fast, slow or normal. A crisp definition of the 
quantity where the value changes from fast to normal to slow cannot be defined in this case. Figure 8 shows 
frequency of typing speed timings graphically for a typical user, the normal typing speed has maximum 
frequency which occurs in day-to-day life while typing, while the frequency decreases as moving towards 
timings that are categorized as fast or slow. The actual values of timing will vary from person to person and a 
crisp range cannot be defined between the three values. Such values introduce a degree of fuzziness and using 
methods such as k-NN fail to classify two users with overlapping typing speed. Such fuzziness is solved by 
using the concepts of fuzzy logic. Fuzzy logic is used in authentication systems based on biometrics to provide 
enhanced security as in case of biometrics a lot of data from different users can be overlapping. In case of keystroke 
dynamics-based authentication system, the input features are keystroke timings which can have varying values 
for a single user which may or may not overlap with another user timings. 

From Figure 8 we can see that it is not possible to define a single value as fast or slow typing speed 
for a given user. As explained in Figure 9 in the fuzzy logic model, inference engine which is responsible for 
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determining the output for a given input based on learned data, uses the input features of the keystroke timings 
which are first converted to the fuzzified input. Using the input keystroke timings, the rule base is generated 
which will be used to determine the timing similarities for a test input. Similarity gives a degree of closeness 
between one typing speed timing against the timings from learned users and is calculated by fuzzifying the test 
input first and based on the similarity values fuzzy output values are calculated. Finally, the fuzzy output can be 
converted to the crisp Authentication value using the available defuzzification functions such as centroid method, or 
normal max value, based on the current membership value, and center of sums. 
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Figure 8. Timings between keystrokes of a participant 
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Figure 9. Working of the proposed fuzzy logic model for keystroke dynamics 


A. Determining rule base for user classification 

The input to the fuzzy system is the key stroke timings from a user collected over time or may be a 
test input typing pattern that needs to be authenticated yet, these inputs need to be converted to the fuzzy inputs 
before they are passed through the inference Engine for further processing, for both learning and test phase. 

Following timings are taken into account when calculating the fuzzy membership input: i) time of 
press for key (hold time: dx); ii) time between first key release and next key press (flight time: ix); iii) total time 
from first key press to last key release (t). 

The input is in the following format for the 4-character keyword “1976” with 8 features [11, i2, 13, di, 
do, d3, d4, t]. Note that this feature set is particular to 4-character keywords only and will change with the 
number of characters used for learning. Here, i1, i2, i3 represent, Interval between releasing key and stroking 
next key (Milliseconds). Similarly, dı, d2, d3, d4 values represent Time period key during key remain pressed. 
Finally, last value represents, Time period first key press and last key release (Milliseconds) 

Multiple feature set will be obtained for a given user during the learning phase. The values averaged 
to obtain an average typing time for a single user for all the three types of timings. We get the following 3 
values for each input tuple for a single user. 


(d, i, t) = Oi dk , i İk, t) (2) 
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To train the user, 


Ek-od 
davg = a (3) 
; Sragi 
iang = ei a) 
Lkeot 
tavg = k=o k (5) 


n 


where, n: total number of input timing feature set for a single user obtained while training. 

Finally, we calculate an upper and lower limit of the typing speeds that can be used as rough estimate 
of the typing timing of a single user in day-to-day life. Threshold tı and tu are defined for each three inputs 
which give a rough estimate of minimum and maximum value of timing for a single user and is calculated as 
the mth standard deviation from the average value. This methodology helps to circumvent any outliers that 
may have occurred during data collection: 


For d: 
tia = dave —m * o(d) (6) 
tud= dave + M * o(d) (7) 
For i: 
ti = dave —m * o(i) (8) 
tui= Lave + m * a(i) (9) 
For t: 
tit = fave —m * o(t) (10) 
tu= tavg + M * o(t) (11) 


where, generally, m is 3, but for our purpose we have taken value of m to be | as using 3 can sometimes allow 
outliers to be falsely accepted and in case of typing it is possible that multiple users may have very overlapping 
timings and may lead to false acceptance. Here, m * ø (d) defines m™ standard deviation around the average 
value. tık and tuk represents the lower and upper learned threshold timings for the user, where k is d, i, t. Finally, 
these values of threshold will be used during the input fuzzification phase to generate input membership 
function for a user input. 


B. Input fuzzification 
The obtained threshold values in the previous step are then used to generate an input membership 
function. The input membership function is defined as: 


0 


sif tix S tk S tuk 
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where, k= d, i and t, respectively and the membership function is calculated for d, i and t respectively. The 
membership function calculations are partitioned based on the thresholds. The membership function will have 
degree | for normal when the timings are between upper and lower threshold value and zero for positive and 
negative. Similarly for cases when the timing is less than or greater than the threshold the degree of membership 
is calculated as shown in (12). 
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So, now we have three values defined for each timing. The membership graph for input is presented 
in Figure 10. The graph timings have been modified to allow for a range of user timings because a single user 
may have a varying range of timing from a general average. All the three keystroke timings will have a similar 
membership function. 

For any incoming test input [ti1, ti2, ti3, ta1, ta2, ta3, ta4, te], (d, i, t) is calculated as shown previously in 
(2). The membership values are calculated for three values d, i, t, using the membership function as mentioned 
in (12). For an incoming test input, we obtain three membership functions pa, Mi, Mt- 
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Figure 10. Fuzzy membership function for the input timings based on authenticated user input. 


C. Calculating similarity for a test input using inference engine 

After converting the input to fuzzified values the inference engine uses the rule base to determine the 
similarity for the input to the learned user timings. Based on the closeness of the features to the limits of the 
authentication values, similarity function can be defined as mentioned in (13), (14): 


_— (Aux) 
a rae, (13) 
Snorm = (s/||s|)/ normal (14) 
where, 
Uk : input membership function with k = d, i, t 
normal : means that membership value of the normal is considered only for similarity. 
S : represents the similarity, 0 <= s <=1. 
Snorm : represents the normalized values 


Here, summation is done separately for negative, normal and positive member values defined in the input 
membership function. 


D. Obtaining output membership function 

Finally, after applying keystroke dynamics timings and calculating the score the inference engine will 
calculate the possible authentication value based on the current learned preferences and based on a threshold 
of similarity ‘s1’, ‘sy’ the authentication values is generated. Where, sı is the lower limit and su is the upper limit 
for similarity thresholds. A similarity value below 0.9 times sı of means no authentication, while if similarity 
is greater than 1.1 times of s, the user is fully authenticated. Using a range instead of strict values of sı and su 
helps to achieve the desired fuzziness by removing any strict crispiness in the threshold values. For in-between 
values of similarity, authentication values are defined using the output membership function as shown in (15). 
Values for the sı and Su can be set based on the learning from the previous data. Typical values for sı are 
0.3-0.5 and for sy are 0.6-0.8. Similar, to the input function an output function can be defined as shown in (15): 
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Based on the similarity value s, membership function for the output op, is calculated as shown in (15). 
The graph for the output membership function is shown in Figure 11. Three values have been defined for 
authentication membership function: i) no authentication, ii) partial/strict authentication, and iii) full 
authentication. 


No Authentication = 
Partial/Strict Authentication 
ESER Full Authentication 


© Membership value —> + 


Reliability ————————_> 


Figure 11. Output fuzzy function obtained from keystroke input after applying inferences 


Instead of using crisp similarity thresholds for sı and su the authentication membership values are 
generated by varying over a range which helps to achieve the desired fuzziness in the output membership 
function as shown in the Figure 10. The partial authentication decreases as the reliability/similarity increases 
while at the same time partial authentication membership increases, similar is the case of partial and full 
authentication the membership values change gradually over a range of similarity values. 


E. Final crisp authentication value generation 
From this approach, finally the output can be converted into de-fuzzified output by taking the max of 
the three outputs as defined in (16): 


Output = max (a, b, c) (16) 


where, 

a : membership value for no auth 

b : membership value for partial auth 
c : membership value for full auth 


2.3. Authentication 

The final step post model training, is of authentication. Both the trained classifiers separately generate 
the authentication results which are then combined together generate the final authentication value. For k-NN 
the trained classifier is used to calculate the nearest distance of the test sample from all of the training samples 
in that hyper dimensional space. Once the nearest distance of the testing sample is calculated, it is checked 
with the permissible threshold value for that keyword and if the value is outside the limits of threshold, the test 
sample is marked as unrecognized typing pattern and the user is classified as imposter. When by recursive 
intruding the imposter matches the value within the permissible threshold, the imposter is falsely allowed the 
access. The permissible threshold solely depends on the duration between typing each consecutive letter. For 
the imposter to match the values in permissible threshold all these parameters should match the parameters that 
of the actual user. 

For fuzzy model the similarity is calculated for user and output authentication values is generated for 
the user. The user is considered to be authenticated if the authentication value is obtained as full authentication. 
A value of no or partial authentication is considered as no authentication in this case. Figure 12 shows basic 
flow chart when unknown user tries to access mobile app or log in to the device, even when he knows the user 
password the proposed system checks the behavioural characteristics of user input pattern on keystroke and 
dis-allows access to the imposters. This way it provided the second level of security to the user in feature 
phone. There are basically two phases in the authentication system-enrolment and login phase. In the enrolment 
phase the user keystroke dynamics are learned by the classifier. 

When setting a 4-digit PIN the timings for the user are captured feature vector is created with it. This 
Feature vector is then passed through the k-NN classifier for training. The fuzzy classifier also learns threshold 
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values for the lower and upper typing timings for the user. During the login phase when a test user actually 
enters the similar timing feature vector are obtained and passed through the learned classifier to obtain the k- 
NN authentication output. Similarly, the input is converted to a fuzzified input and then using the fuzzy 
inference converted to fuzzified output to finally obtain the crisp output result. 


Enrollment Login 


Registration 


Figure 12. Basic flow diagram of proposed keystroke dynamics 


3. RESULTS AND DISCUSSION 

The accuracy of the keystroke dynamics system is measured on the following counts: 

a) False rejection rate (FRR) is the measure of the percentage ratio between incorrectly reject authorized users 

against the total number of genuine users accessing the system. A lower false rejection rate means less 

reject cases and easier access by legitimate user. 

False acceptance rate or FAR is measure of the percentage ratio between falsely accepted unauthorized 

users against the total number of imposters accessing the system. Terms such as false match rate (FMR) or 

type 2 error refers to the same meaning. A smaller FAR indicates less imposter accepted. 

c) Equal error rate (EER) is used to determine the biometric system accuracy. When both FAR and FRR rates 
are equal that intersection point is EER. The lower the value of EER the higher the precision of the biometric 
system. 

Based on an experiment conducted to find accuracy of keystroke analysis, FAR i.e. rate of system 
giving access to an imposter. Experiment has been performed for 25 different users for different acceptance 
threshold FRR i.e. rate of system denying access to authorized user has been taken for 25 different users for 
different acceptance threshold. To conduct this experiment fluently, the authorized user took 4-digit PIN as 
input. Now, to know the EER value, we plotted the graph between FAR and FRR value. The intersection point 
of this graph so plotted gives us the EER value as shown in Figure 13. More precisely the closest match of 


b 


wm 
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FRR and FAR value is the EER. Table 1 represents a comparative evaluation of accuracy parameters performed 
between our proposed k-NN model and the improved version of our model when fuzzy logic is added alongside 
k-NN classifier to find the final authentication value. 
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Figure 13. Relation between FAR and FRR 
Table 1. A Comparison between simulations of different techniques 
User Age Gender k-NN k-NN with Fuzzy 
FAR FRR EER FAR FRR EER 
User1 27 Male 0.015 0.014 1.45% 0.017 0.015 16% 
User2 32 Male 0.023 0.05 3.65% 0.012 0.017 1.45% 
User3 27 Female 0.092 0.093 9.25% 0.013 0.017 15% 
User4 24 Female 0.031 0.014 2.25% 0.021 0.018 1.95% 
User5 28 Female 0.092 0.093 9.25% 0.01 0.011 1.05% 
User6 31 Male 0.023 0.021 2.20% 0.015 0.015 15% 
User7 42 Male 0.031 0.029 3.00% 0.013 0.015 1.39% 
User8 25 Male 0.008 0.021 1.45% 0.01 0.010 1.00% 
User9 29 Male 0.062 0.043 5.25% 0.010 0.019 1.44% 
UserlO0 22 Male 0.054 0.064 5.90% 0.012 0.015 1.35% 
Userl1 22 Female 0.026 0.032 2.9% 0.020 0.019 1.95% 
Userl2 23 Female 0.014 0.029 2.1% 0.010 0.012 1.1% 
Userl3 24 Female 0.026 0.005 1.6% 0.010 0.014 1.20% 
Userl4 25 Male 0.043 0.008 2.6% 0.018 0.020 1.90% 
Userl5 22 Male 0.005 0.02 1.46% 0.009 0.013 1.1% 
Userl6 23 Female 0.02 0.023 2.19% 0.012 0.020 1.6% 
Userl7 24 Female 0.023 0.017 2.0% 0.027 0.012 1.95% 
Userl8 23 Male 0.020 0.026 2.3% 0.020 0.020 2.0% 
Userl9 22 Male 0.011 0.029 2.04% 0.057 0.027 4.2% 
User20 23 Female 0.014 0.002 0.8% 0.034 0.035 3.45% 
User21 22 Male 0.035 0.023 2.9% 0.037 0.033 3.5% 
User22 22 Male 0.026 0.002 1.4% 0.021 0.025 2.3% 
User23 27 Female 0.017 0.017 1.75% 0.01 0.020 1.5% 
User24 25 Male 0.043 0.017 3.0% 0.025 0.03 2.75% 
User25 27 Male 0.038 0.023 3.0% 0.022 0.025 2.35% 


In this performance evaluation, we have found out that using only the k-NN model over the biometric 
input features had a limitation in that the model does not take into account the variance in the keystroke 
latencies among the multiple attempts of the single user. From the results, we can observe that k-NN model the 
average EER 3.03% and this is the best result we found with our experiments when the value of k was set to 5. 
We can observe that the EER has shown improvement and it’s decreased from 3.03% to 1.88% when fuzzy 
was applied with k-NN. Out of 25 users we have observed that 12 users the EER results were less than 1.5%. 
Best results were observed for user5 (female) with EER 1.05% with k-NN and fuzzy combined model. In 
Figures 14, 15 and 16 we have analyzed the FAR, FRR and ERR for 25 users with k-NN, k-NN with fuzzy 
methods. It can be observed from our experiments that k-NN and fuzzy model when combined have performed 
better for most of users and its results are superior from k-NN. 
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Figure 14. FAR with k-NN, k-NN with fuzzy for 25 Figure 15. FRR with k-NN, k-NN with fuzzy for 25 
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Figure 16. EER with k-NN, k-NN with fuzzy for 25 different users 


In Table 2 we have shared the comparison of the proposed work with existing work in keystroke 
dynamics from other researches is covered. In the proposed work, we have considered 60 patterns from each 
user and 25-users participated in experiments. Our study is competitive as its network trained with combination 
of k-NN with fuzzy method with EER rate of 1.88% when experiments were performed with 25 participants 
on feature phone Samsung on 7 Pro at Samsung India labs. 


Table 2. Comparison with existing work 


Study Input Data # Of # of Inputs Classifier EER (%) 
Participants Training 
Clarke and Furnell [12] 4-digit PIN 32 30 Neural networks 12.8% 
Hwang et al. [13] 4-digit PIN 10 5 Artificial rhythm with Cues Ee 
Wang et al. [19] 4-digit PIN 104 20 Support vector machine (SVM) 8.70% 
Chang et al [20] 200 words 114 3 Statistical classifier 7.89% 
Mondal et al [21] All keys óf 53 7*105 ANN and CPANN 2.35% 
keyboard 
Lee et al. [22] 6-digit PIN 22 100 Marihattan and Buçlidean 7.89% 
Distance 
Kim et al. [23] 6-digit PIN 6 100 Statistical classifier 13.44% 
Frolova et al. [24] alphanumeric 15 30 LOE, Manhattan and Euclidean 8.00% 
ensemble 
Proposed Work 4-digit PIN 25 60 k-NN 3.07% 
Proposed Work 4-digit PIN 25 60 k-NN with fuzzy 1.88% 


In the litreture, there are many research works done for keystroke dynamics [25]-[36] from last three 
deacdes, however with the proposed keystroke dynmaics study for feature phones we have proposed k-NN 
classifier with fuzzy logic model to provide enhanced security when data from different users can be overlapping. 
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4. CONCLUSION AND FUTURE WORK 

Behavior biometrics is the future of security domain, and using the keystroke modality can be a 
simplest way to achieve this efficiently and precisely. With the developed study the accuracy of EER 1.88% is 
achieved by training model with 60 samples with 25 users. With proposed study the accuracy rate is increased 
as we used combination of k-NN with fuzzy to improve the results with sufficient samples to train the model. 
Building multiple models for different keywords that are frequent in usage can help us to monitor the user 
while typing in a generic scenario of chatting platform and suspicious operation over the handheld device can 
be tracked and prevented. To increase the scope of this security, other modalities such as touch analytics, 
battery charging patterns and walking patterns of an individual can be explored as future research work for 
mobile phone security under behavioral biometric research scope. For future work we intend to increase the 
concept with smartphone devices and there we would like to develop multimodal framework based on 
keystroke dynamics and user swipe pattern to recognize the user. This can further be connected using cloud 
support to deploy the keystroke machine learning models with better internet connection which help to improve 
the security further by processing the data on the web servers and keeping client devices light and handy. 
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